Coordinate-descent for learning orthogonal matrices through Givens rotations
نویسندگان
چکیده
Optimizing over the set of orthogonal matrices is a central component in problems like sparsePCA or tensor decomposition. Unfortunately, such optimization is hard since simple operations on orthogonal matrices easily break orthogonality, and correcting orthogonality usually costs a large amount of computation. Here we propose a framework for optimizing orthogonal matrices, that is the parallel of coordinate-descent in Euclidean spaces. It is based on Givens-rotations, a fast-to-compute operation that affects a small number of entries in the learned matrix, and preserves orthogonality. We show two applications of this approach: an algorithm for tensor decompositions used in learning mixture models, and an algorithm for sparsePCA. We study the parameter regime where a Givens rotation approach converges faster and achieves a superior model on a genome-wide brain-wide mRNA expression dataset.
منابع مشابه
Efficient coordinate-descent for orthogonal matrices through Givens rotations
Optimizing over the set of orthogonal matrices is a central component in problems like sparse-PCA or tensor decomposition. Unfortunately, such optimization is hard since simple operations on orthogonal matrices easily break orthogonality, and correcting orthogonality usually costs a large amount of computation. Here we propose a framework for optimizing orthogonal matrices, that is the parallel...
متن کاملSupplemental material of Coordinate-descent for learning orthogonal matrices through Givens rotations
Theorem 1. Convergence to local optimum (a) The sequence of iterates Ut of Algorithm 4 satisfies: limt→∞ ||∇f(Ut)|| = 0. This means that the accumulation points of the sequence {Ut}t=1 are critical points of f . (b) Assume the critical points of f are isolated. Let U∗ be a critical point of f . Then U∗ is a local minimum of f if and only if it is asymptotically stable with regard to the sequenc...
متن کاملSupplemental material for: Efficient coordinate-descent for orthogonal matrices through Givens rotations
Definition 1. Riemannian gradient The Riemannian gradient ∇f(U) of f at point U ∈ Od is the matrix UΩ, where Ω ∈ Skew(d), Ωji = −Ωij = ∇ijf(U), 1 ≤ i < j ≤ d is the directional derivative as defined in Eq. 1 of the main text, and Ωii = 0. The norm of the Riemannian gradient ||∇f(U)|| = Tr(∇f(U)∇f(U) ) = ||Ω||fro. Definition 2. A point U∗ ∈ Od is asymptotically stable with respect to Algorithm 1...
متن کاملNORGES TEKNISK-NATURVITENSKAPELIGE UNIVERSITET Specifying Gaussian Markov Random Fields with Incomplete Orthogonal Factorization using Givens Rotations
In this paper an approach for finding a sparse incomplete Cholesky factor through an incomplete orthogonal factorization with Givens rotations is discussed and applied to Gaussian Markov random fields (GMRFs). The incomplete Cholesky factor obtained from the incomplete orthogonal factorization is usually sparser than the commonly used Cholesky factor obtained through the standard Cholesky facto...
متن کاملScalable streaming learning of dyadic relationships
Modern machine learning problems are often about learning a mapping between two high-dimensional domains. For example, learning a mapping from the domain of images to the domain of words, or learning which documents are similar to others. We refer to these mappings as dyadic relationships. In this thesis we focus on learning dyadic relationships encoded in matrix form. For example, a matrix map...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014